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tasks operating in uncertain environments with minimum interaction with a human operator Although mb 
dSne™Tave buUt smart robots, utilising heuristic ideas, there is no systematic approach to design such 

^m the fields of computers, systems, A! and information the- 
imWridi. W. beep developing . novel ppproach, deUped a. Hiereechice l MpS W 

Precision with Decreasing Intelligence (IPDI) (Saridis 1979). . . , ITevis 19881 

This nrinciple even though resembles the managerial structure of organisational systems (Levis 198 ), 
has bin S o." analytic basis by Saridis (1988). The impact of this work is in the « 
design of intelligent robots, since it provides analytic techniques for universal production (blueprint ) 

SUCh ThTt^ose of the paper is to derive analytically a Bolt.mann machine suitable for optimal connection 
of nodes iTneural net (Fahlman, Hinton, Sejnowski, 1985). Then this machine wdl serve to search for 

rmtimal design of the Organization level of an Intelligent Machine. r , 

In order to accomplish this, some mathematical theory of the intelligent machines will be first outlined. 
Then some definitions of the variables associated with the principle, like machine intelligence machine ^knowl- 
edge and precision will be made (Saridis, Valavanis 1988). Then a procedure to establish the Boltzmann 
machine on an analytic basis will be presented and illustrated by an example in designing the organisation 
kvri S Tmtelligent Machine. A new search technique, the Modified Genet* Algorithm, ^presented and 
proved Tc!terg g e to the minimum of a cost function. Finally, simulations will show the effectiveness of a 
variety of search techniques for the Intelligent Machine. 

1 THE MATHEMATICAL THEORY OF INTELLIGENT CONTROLS ... 

In order to design intelligent machines that require for their operation control system with mteUl 8«“ 

functions such as simultaneous utilisation of a memory, learning, o. T^idb^ Tl^WSsT The“ 
to “fussy” or qualitative commands, Inteltal SfillRfikhave been developed by Saridis (1977, 1983) lhey 
utilize the results of cognitive systems research effectively with various mathematical programming 

eyalema, propped by Spridi. (mb) combipe. .be p.w.rb.1 high-level 

rw.ml Theory . This research is aimed to establish Intelligent Controls as an engineering isc p , 
plays a central role in the design of Intelligent Autonomous Systems. 
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•»« »«** methods „d 

mdhir. costal systems^with mt. lli Z^h,o^^. ^.^-^ ■>"*»““ •»** 

1. The organisation level. 

2. The coordination level. 

3. The execution level. 

, The Qrgftniiat i on Leve l is intended to perform such operations as Dlannin<r *nA i»;«k i i j 
makmg from ka g term memoriVft . It may require high level information proceseig such £ the knowledge 

ii#ps§g=&3si 

Kn ow l edge floy m an mtelligent machine’s organisation level represents respectively: 

1. Data Handling and Management. 

2. Planning and Decision performed by the central processing units. 

3. Sensing and Data Acquisition obtained through peripheral devices. 

4. Formal Languages which define the software 

“SSiSSr -F ~sa;ss£st53 = jss 

T h e ElKtUtWH Lcyc I executes the appropriate control function* Ti« n.rtn™ . , 

expressed as an entropy, thus unifying thefaLion s T^^nt mach^T ““ *° 

Optimal control theory utilises a non-negative functional of the states of a system in the state. .n«y« 

2.3SS 

£ 3ES2K rf *■ *— ■ ^ i(«££ 

lenu'a^d SKT “ “* T* 

Entropy aatufies the additive property, and any system composed of a combinatfo ^oft^h subsyste^Is^an 
vZSTSt* mmUn “ mg ,tB t0tal lDf0rmati0n theoretk method s based on entropy may^“ 
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Since all levels of a hierarchical intelligent control can be measured by entropies and then- rates, then 
the optimal operation of an ‘intelligent machine* can be obtained through the solution of mathematical 

programming problems. . , £ „ 

The various aspects of the theory of hierarchically intelligent controls may be summarised as follows. 

Th* theory of intelligent machines mav be postulated flfi the mathematical problem finding thfe 
right sequence of decisions and controls for a system structured according to the principle^ 
increasing precision with decreasi n g intelligence (constraint) gudl that it milUnUKg tota l 

entr °The above analytic formulation of the “intelligent machine problem* as a hierarchically intelligent control 
problem is based on the use of entropy as a measure of performance at all the levels of the hierarchy. It 
has many advantages because of the tree-like structure of the decision making process, and brings together 
functions that belong to a variety of disciplines. 

5. KNOWLEDGE FLOW AND THE PRINCIPLE OF IPDI 

The concept of entropy used in this paper may be generalised if one introduces theory of evidence or 
the cases that Intelligent Machines are endowed with judgment, a very human property. 

The general concepts of Intelligent Control Systems are the fundamental notions of Machine Intelligence, 
Machine Knowledge, its Rate and Precision. The definitions useful in order to derive the principle of IPDI 

are presented in (Saridis, Moed 1988). 

Analytically, the relations may be summarised as follows: 

Knowledge (K) representing a type of information may be represented as 

K = -a- lnp(K) = (Energy) (!) 

where p(K) is the probability density of Knowledge. 

Prom equation (1) the probability density function p(K) satisfies the following expression in agreement 

with Jaynes’ principle of Maximum Entropy (1957): 

p(K) = e~ a ~ K \ a = ln ^ e~ K dx (2) 

The Rate of Knowledge R which is the main variable of an intelligent machine with discrete states is 

R=J = (Power) 

It was intuitively thought (Saridis 1983), that the Rate of Knowledge must satisfy the fol- 
lowing relation which may be thought of expressing the principle of Inerting Precision w»th 
Decreasin g Intelligence 

(MI) : (DB) — » (R) ( 3 ) 

A special case with obvious interpretation is, when R is fixed, machine intelligence is largest for a smaller 
data base e.g. complexity of the process. This is in agreement with Vamoe’ theory of Metalanguages (1986). 

It is interesting to notice the resemblance of this entropy formulation of the Intelligent Control Problem 
with the e-entropy formulation of the metric theory of complexity originated by Kolomogorov (1956) and 
applied to system theory by Zames (1979). Both methods imply that an increase in Knowledge (feedback) 
reduces the amount of entropy (e-entropy) which measures the uncertainty involved with the system. 

An analytic formulation of the above principle has been derived from simple probabilistic relation among 
the Rate of Knowledge, Machine Intelligence and the Data Base of Knowledge. The entropies of the various 
functions come naturally into the picture as a measure of their activities. 

4. THE DESIGN OF THE ORGANIZATION LEVEL OF AN INTELLIGENT MACHINE 
AS A BOLTZMANN MACHINE 

In the current literature of parallel architectures for Machine Intelligence, the Boltzmann machine 
represents a powerful, neural network based architecture that allows efficient searches to optimally obtain 
the combination of certain hypotheses of input data and constraints (Fahlman, Hinton, Sejnowski 1985). 
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The Boltsmann architecture may be interpreted as the machine that searches for the optimal inter- 
connection of several nodes (neurons) representing different primitive events in order to produce a string 
defining an optimal task. Such a device may prove extremely useful for the design of the Organisation Level 
of an Intelligent Machine (Saridis, Valavanis 1988) (Figure 2). 

We associate the state of each node with a binary random variable x, = {0, 1}, with a priori probabilities 
p(xi = 1) = pi , p(xi = 0) = 1 — pi , where 1 represents the firing of neuron i, and 0 indicates neuron s' 

idle. The state vector of the network, X = (x u x 2 x„ } is an ordered set of 0’s and l’s describing 

the state of the machine in terms of firing/idle nodes, for an n node machine. The neurons of the machine 
can be visible, or hidden (Hinton, Sejnowksi 1986). It is possible to extract the string of pr imi tive events 
representing the optimal task by examining the state vector of the visible nodes in the network in steady 
state response to a given input. 

The standard formulation of the Boltsmann machine uses Energy as a cost function which is minimised 
to find the optimal state of the machine. However, in (1) we defined knowledge as a form of Energy. This is 
not the function to be minimised in the Intelligent Machine. Instead, we will be minimising the Energy of 
Flow of Knowledge (F), which is the amount of knowledge which must flow through the machine in order to 
accomplish a particular task. This is found by: 


F = R*T (4) 

where T is the total time of knowledge flow. By minimising F, the Intelligent Machine reduces the 
amount of Energy required to make a decision. 

5. ENTROPY AS A MEASURE OF UNCERTAINTY 

Entropy is used as a measure of uncertainty in the intelligent machine. The entropy manifests itself in 
the interaction and interconnection of nodes in the network. We can define energy of flow of knowledge into 
node « by: 


F i = a i-\Y^ w *i x i x i 


and the probability the machine is in a state where Energy = F+ by: 


( 5 ) 


W) = * 


i 


( 6 ) 


where: 

tHg*y is the interconnection weight between nodes t and j 
wu = 0 

a 4 is a probability normalising factor which insures .5 < P{Fi) < 1 

Unlike the Boltsmann machine, this formulation does not remove a* when x* = 0. Instead, the machine 
operates from a base entropy level which it tries to reduce. 

By bounding P(-F t ) by 0.5, we find that the entropy of being in a state where Energy = F+ increases as 
Fi increases. In other words, as the Energy increases, the uncertainty increases as well. 

The Entropy of Knowledge Flow in the machine can be formulated as: 


H(F) = -'£P(Fi)ln{P(F i )} 

i 

Therefore: 


H ( F ) = X^( a ‘ *<*/)« i 


6. SEARCH TECHNIQUES FOR THE INTELLIGENT MACHINE 

Three random search techniques are compared here which may be used to find the minimum entropy in 
the Organisation Level of an intelligent machine. By examining the active visible neurons in the minimum 
entropy state of the network, one can determine the sequence of primitive events which produce a string 
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defining an optimal task for an intelligent machine. The techniques presented here allow escape from local 
entropy minima, which lead to incorrect task decisions, by randomly selecting states while searching for the 
global entropy minimum. 

6.1 A Genetic Algorithm Search Technique 

A technique which minimises a system cost funciton is the Genetic Algorithm (Holland 1975). In 
contrast to other random search techniques, the Genetic Algorithm (GA) maintains a population of points 
in the space while searching for the optimum. 

Here we present a modified GA which will converge in probability to the minimum cost. The standard 
GA has been changed by inserting spacer steps of an algorithm which is known to converge in probability, 
Expanding Subinterval Random Search. 

Spacer steps are defined as follows: Suppose B is an algorithm which together with a descent function 
Z and solution set T converges in probability. We can define an algorithm C by C(x) = {y : Z < Z(x)}. In 
other words, C applied to x can give any point so long as it does not increase the value of Z, the current 
cost. B represents the spacer step, and the complex process between spacer steps is C. Thus, the overall 
process amounts to repeated applications of the composite algorithm CB. CB will converge in probability 
if B is repeated infinitely often and C does not increase the value of the current cost (Luenberger 1984). 

We introduce the concept of “immigration* to imbed ESRS into GA. Infinitely often, we insert a 
randomly generated point into the GA search which forms the spacer step. The frequency of insertion is 
called the “immigration rate." By changing the “immigration rate ,* the algorithm adjusts its focus from 
global to local searches. This rate may be fixed dependent on the complexity of the search space, or may 
vary while the search is in progress. A high “immigration rate" will force random search. A low rate will 
cause the GA. Parallels can be drawn to Simulated Annealing which starts as a near random search, and 
eventually becomes gradient descent. For the modified GA, the “immigration rate* is analogous to thermal 
energy in Simulated Annealing. The modified algorithm described in detail below converges in probability 
to the minimum cost. 

6.1.1 Standard Genetic Algorithm 

In general, each point in the space is represented by a binary string and has an associated cost dictated 
by the system cost function for that point. Since the makeup of the population is changed each iteration 
to emphasise members (points) which minimise the cost function, a near-uniform population will develop 
corresponding to a local minima in the cost function. 

The following notation is used: 

P = population of members (points) 

P* = new population of members 
|P| = number of members in P 
Pk = kth member of the population P 
P*(m) = mth bit of Pk 
J K — Cost of Pk 

Sk = probability of member k being selected from current population 
J raax = max cost of any possible string in P 
n = length of Pk in bits 

Each iteration of the search algorithm proceeds as follows: 

Repeat: 

1. Compute Jfc,VPjtcP 

2. Let J 9 h = J m ax - Jk,Vk. Compute Sk = 

• 

Repeat: 

3.1 Randomly select Py,Pfc from P based on Sy, Sk . 

3.2 Randomly generate an index i between L.n. Exchange the right string halves of 
Py, Pfc(i.e.,Py(i..n) = P*(t..n) and P^(t..n) = Py(t..n)). This is called •crossover" or a mating." 
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3.3 Place in P*. Return P J t P k to P 

until |J*| = |P|. 

4. Set P = P* 

until Pk*P and P k has minimum cost. 

In an attempt to prevent population convergence on a local minima (“premature convergence"), a 
“mutation" operator is added to the system. With a new generation of the population, each bit of every 
member has a small probability of inverting. The inversion adds diversity to the population and promotes 
search in previously unexplored regions of the space in an attempt to find the global cost minimum. 

Particular aspects of this algorithm make it a powerful search tool. The “crossover* mechanism forces 
search on an n-dimensional hypercube by discovering and promoting particular substrings (called “building 
blocks") which perform well. These “building blocks* combine to discover the topology of the search space, 
which may not be known initially. Since the algorithm uses a population of points, many planes of the 
hypercube can be searched at once, leading to “implicit parallelism.* Further, since members within a 
population are independent, a new population may be formed by “mating" in parallel. Steps 3.1-3.3 can be 
blocked together and generate two new members in parallel with other “mating" blocks. These features as 
well as others are described in depth in (Goldberg 1989). Applications of this algorithm are presented in 
(De Jong 1975, Grefenstette et al 1985, Davis and Coombs, 1987). 

Heuristic algorithms within GA have been developed to avoid convergence at local minima (Maldin 
1984, Suh and Van Gucht 1987). The *SIGH* system (Ackley 1987) uses active and passive subpopulations 
to escape local minima. When particular members of the population are performing poorly, they become 
passive until the active subpopulation converges. If this convergence is premature, the passive members are 
activated, bringing diversity and new structure to the search. 

6.1*2 Modified Genetic Algorithm 

Unfortunately, many of the heuristically driven GA searches perform well for a small set of functions, 
and prematurely converge for functions outside that set. However, it can be shown that under certain 
conditions, the GA will converge in probability to the global minimum of the cost function. 

The conditions are as follows: 

1. Instead of (or in addition to) the “mutation* operator, an “immigration* operator is used. Introduce a 

randomly generated member Pi to P t every Af populations for some integer M > 0. 

2. If P k €P and Vf* € P,P fc > Pi then P*cP 

Step 1 inbeds ESRS into the GA, where ESRS is algorithm B as described by (Luenberger 1984) and 
stated above. Step 2 insures C{z) = {y : Z(y) < Z(z)} where C is the GA algorithm. Therefore, CB, the 
modified GA, converges in probability to the cost minimum. 

As one can see, these conditions do not bind the algorithm severely. The “immigration* rate (immigra- 
tions/population), 1/M, is related to the “mutation” rate (mutations/bit) as follows: 

1/M = (mutations/bit) * (members/population) 

In fact, the “immigration* of new members may be probabilistic, with probability 1/M. 

6.2 Simulated Annealing 

One random search technique commonly used to find the global minimum cost in a Boltzmann Machine 
is Simulated Annealing. This technique simulates the annealing process of metal by probabilistically allowing 
uphill steps in a state-dependent cost function while finding the global cost minimum, or ground state. The 
algorithm allows control of the search randomness by a user specified parameter, T. In true metal annealing, 
this cost function is the Energy of the system, E } and T is the annealing temperature (Kirkpatrick et al. 
1983). This method can easily be adapted for finding the minimum entropy of the Organization Level of an 
intelligent machine. 

Given a small random change in the system state Xi = {xi, x 3 , . . . , z n ) to AJ and the resulting 
entropy change, A H y if AH < 0, the change is accepted. If AH > 0, the probability that the new state is 
accepted is: 

p(*i+i = Xi) = (16) 

where Kb is the Boltsmann Constant and T is a user set parameter. By reducing T along a schedule, called 
the annealing schedule, the system should settle into a near-ground state as T approaches 0. 
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Another method for simulated annealing is discussed in (Hinton, Sejnowski 1986). Using this method 
if the entropy change between X t and X[ is AH, then regardless of the previous state, accept state X\ with 

probability: - 

p{Xi+i = X\) = j + e -A WJt 

Since an intelligent machine consists of a set of binary states, it should be noted that in both of the above 
methods, X' { is Hamming distance 1 from X t (Kam et al. 1985). 

The process of simulated annealing escapes local minima through its probabilistic random search, and 
probabilistically converges to the global cost minimum under certain conditions (Geman, Geman 1984). 
The next technique, Expanding Subinterval Random Search, probabilistically guarantees convergence within 
a 6 n eigh borhood to the global minimum of a specified cost function. 

0.3 Expanding Subinterval Random Search 

A third technique for finding the global minimum value for a cost function for a dynamic system is 
Expanding Subinterval Random Search as described in (Saridis 1976). Using entropy as the cost function 
and given a state X it one may define the following random search algorithm for an appropriately selected fi: 


X, 


i+1 


-{ 


X[ if H{X<)-H{Xi)< 2m 
Xi if H(X<) - H(Xi) > 2/i 


(18) 


where HiY) is the entropy induced by state Y = (*, Jfc, . . . , y„) and X[ is a randomly seated state vector 
generated from a prespecified independent and identically distributed density function, defined by (5). 

It is shown that: 


lim 

»-*oo 


Prob \H(X n )-H^ in <6] = l 


(19) 


where H^ ln is the global minimum entropy of the network. The existence of H^ ia is proven m the cited 

WOT This method can be used on-line to find the global minimum entropy in the Organisation Level of an 
intelligent machine. 

7. EXPERIMENTAL RESULTS 

7.1 Simulation of Search Techniques , , . . 

A net was created which recognised strings of 15 bit binary numbers. The net was formulated usmg the 
standard Energy methods found in (Hinton, Sejnowski 1986). Energy was used instead of Entropy in these 
simulations for two reasons. First, to compare the results of this simulation to the results of simulations by 
other researchers, a standard measure had to be used. Second, the method for creating regions of attraction 

in an Entropy based net is still being investigated. 

The net had three Energy minima, corresponding to states (001010100100100, 110110110001101, 
001111101100010). The respective Energy for these three states were (0.8, 0.6, 1.0). Each simulation 
technique attempts to find the global Energy minimum of the net, which was 0.6. The cases presented here 
show best and worst performance of each technique over 10 trials. Other cases which varied the depth and 
width of the Energy wells are presented in (Saridis and Moed, 1988). For this experiment, the wells were 


UOUUVT. . . 

The modified Genetic Algorithm was performed as presented in Section 6.1. The population was 
set at 20 members. Each member was 15 bits long, so the number of bits in each population was 300. The 
“immigration rate" was set to 0.5 which corresponds to a mutation rate of 0.025. 

Simulated Annealing was performed using the acceptance criteria in (17). The system was cooled in 


accordance with: 


ri(«) _ 1 

To log(10 + 1) 


where Ti (t) = temperature at time t 
To = initial temperature. 

The net state changed in Hamming distance 1 increments. 

Expanding Subinterval Random Search (Saridis 1976) was slightly modified to reinforce the prob- 
abilistic selection of node states which reduced the Energy in the net. The probability of a node bemg active 
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was initially 0.5. When the Energy was reduced during search, the probability of the node being reactivated 
became 

P(z, = 1) = P( Xi = 1) + [l.o - P{ Xi = 1)] * 0.1 

if the node was active, or 

P(xi = 1) = P( Xi = 1) - P( Xi = 1) * o.l 

if the node was inactive. 

Figures 3-8 present the best and worst performance of each algorithm over 10 trials. Modified OA 
found the minimum Energy string between the 20th and 180th population. Since there were 20 strings per 
population, this indicates that between 400 and 3600 points had to be generated. The best performance 
by Simulated Annealing required over 5500 iterations. The worst performance did not converse in 12000 
iterations (the most attempted). As a guideline, the best performance of the random search ESRS was 
slightly over 2000 iterations. The worst performance did not converge in 12000 iterations. The results of 
these limited experiments force a closer examination of the Modified Genetic Algorithm as a search technique 
for minimising the Energy in a Boltsmann machine. 

8. CONCLUSIONS 

A mathematical theory for intelligent machines was proposed and traced back to its origins. The 
methodology was developed to formulate the “intelligent machine* , of which an intelligent robot system is a 
typical example, as a mathematical programming problem as using the aggregated entropy of the system as 
its performance measure. The levels of the machine structured according to the Principle of Increasing 
Precision with Decreasing Intelligence can adapt performance measures easily expressed as entropies. 
This work establishesan analytic formulation of the Principle, provides entropy measures for the account of 
the underlying activities, and integrates it with the main theory of ‘Intelligent Machines*. Optimal solutions 
of the problem of the “intelligent machine* can be obtained by minimising the overall entropy of the system. 

This formulation was proven to be applicable to the derivation and design of parallel architectures for 
Machine Intelligence. The Boltsmann machine was analytically derived from the definitions of knowledge 
flow and Jaynes’ principle of maximum entropy. The Modified Genetic Algorithm was presented as a search 
technique which converged in probability to the minimum of a specified cost function. Three techniques 
the Modified Genetic Algorithm, Simulated Annealing, and Expanding Subinterval Random Search were 
described as methods to find the global minimum Energy of a Boltsmann Machine. Simulations using these 
search techniques were conducted, and results indicate that the modified Genetic Algorithm may be an 
efficient method to find the minimum Energy. 
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